Search CORE

11 research outputs found

Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

Author: de Groot Bert L.
Esztermann Ansgar
Fechner Martin
Grubmüller Helmut
Kutzner Carsten
Páll Szilárd
Publication venue: 'Wiley'
Publication date: 03/07/2015
Field of study

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well exploited with a combination of SIMD, multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as accelerators to compute interactions offloaded from the CPU. Here we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Though hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed since these cards do not support ECC memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime

arXiv.org e-Print Archive

PubMed Central

MPG.PuRe

More Bang for Your Buck: Improved use of GPU Nodes for GROMACS 2018

Author: de Groot Bert L.
Esztermann Ansgar
Fechner Martin
Grubmüller Helmut
Kutzner Carsten
Páll Szilárd
Publication venue
Publication date: 13/06/2019
Field of study

We identify hardware that is optimal to produce molecular dynamics trajectories on Linux compute clusters with the GROMACS 2018 simulation package. Therefore, we benchmark the GROMACS performance on a diverse set of compute nodes and relate it to the costs of the nodes, which may include their lifetime costs for energy and cooling. In agreement with our earlier investigation using GROMACS 4.6 on hardware of 2014, the performance to price ratio of consumer GPU nodes is considerably higher than that of CPU nodes. However, with GROMACS 2018, the optimal CPU to GPU processing power balance has shifted even more towards the GPU. Hence, nodes optimized for GROMACS 2018 and later versions enable a significantly higher performance to price ratio than nodes optimized for older GROMACS versions. Moreover, the shift towards GPU processing allows to cheaply upgrade old nodes with recent GPUs, yielding essentially the same performance as comparable brand-new hardware.Comment: 41 pages, 13 figures, 4 tables. This updated version includes the following improvements: - most notably, added benchmarks for two coarse grain MARTINI systems VES and BIG, resulting in a new Figure 13 - fixed typos - made text clearer in some places - added two more benchmarks for MEM and RIB systems (E3-1240v6 + RTX 2080 / 2080Ti

arXiv.org e-Print Archive

MPG.PuRe

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Author: Abraham Mark James
Hess Berk
Lindahl Erik
Murtola Teemu
Páll Szilárd
Schulz Roland
Smith Jeremy C.
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 01/01/2015
Field of study

AbstractGROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported

Publikationer från KTH

Elsevier - Publisher Connector

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Highly Tuned Small Matrix Multiplications Applied to Spectral Element Code Nek5000

Author: Gong Jing
Hess Berk
Peplinski Adam
Páll Szilárd
Schlatter Philipp
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.Nek5000 is an open-source code for simulating incompressible flows using MPI for parallel communication. In the Nek5000 code, the tensor-product-based operator evaluation can be implemented as small dense matrix-matrix multiplications. It is clear that the routines for calculating the matrix-matrix product dominate the execution time of Nek5000. In this paper, we conduct the optimization of matrix-matrix multiplication using SIMD intrinsics and the LIBXSMM package. The evaluation of the computational cost and optimization of these subroutines is not only applied to the CFD code Nek5000, but also to the NekCEM and NekLEM software, which share same data structures with Nek5000

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

GROMACS 4.6 heterogenous CPU-GPU acceleration

Author: Szilárd Páll (513226)
Publication venue
Publication date
Field of study

Control and data-flow of the heterogeneus parallelization in GROMACS 4.6. The diagram illustrates both normal MD steps (black lines) as well as those steps in which the pair-search and domain-decomposition is carried out (blue). In the latter, the additional transfer of the pair list from the CPU to the GPU and a subsequent pruning done in the CUDA kernel is also indicated.</p

FigShare

Intro to HPC for Life Scientists: Mapping computation to HPC hardware & GPU accelerators and heterogeneous architectures

Author: Berk Hess (8200138)
Szilárd Páll (513226)
Publication venue
Publication date: 20/03/2023
Field of study

Lecture slides and exercises for BioExcel-PerMedCoE Introduction to HPC for Life Scientists. Mapping computation to HPC hardware: molecular simulation (lecture) GPU accelerators and heterogeneous architectures (lecture) Introduction to HPC: molecular dynamics simulations with GROMACS (exercises) Barcelona, March 8 2023</p

FigShare

GROMACS 5.1 vs 2016 performance in GPU-accelerated ceramide pull simulations

Author: Magnus Lundborg (1275657)
Szilárd Páll (513226)
Publication venue
Publication date
Field of study

Computational modeling of the skin barrier, the lipid matrix of the stratum corneum, using molecular dynamics simulations. These studies give further insight about the largest organ in the human body and will further clinical experiments. Thanks to the highly optimized heterogeneous parallelization in GROMACS, complex computational studies can be carried out quickly and efficiently. The right panel shows the modeled molecular system with ceramide molecules in green, cholesterols in white, and fatty acids in red. Recent work focused on improving SIMD, GPU, and thread parallelization and resulted in speeding up the aforementioned calculations by up to 50%. The left panel illustrates the execution time breakdown of the CPU and GPU tasks in GROMACS versions 5.1 and 2016; improved performance of multiple tasks leads to an increase in simulation throughput from 61 ns/day to 95 ns/day. The simulations were performed on a workstation equipped with a Core i7-5960X CPU and a GeForce TITAN X GPU.Benchmarks and performance illustration (left panel) by S.P., the ceramide system model and rendering by M.L.</p

FigShare

Direct-Space Corrections Enable Fast and Accurate Lorentz–Berthelot Combination Rule Lennard-Jones Lattice Summation

Author: Berk Hess
Christian L. Wennberg
Erik Lindahl
Hockney R. W.
Mark J. Abraham
Szilárd Páll
Teemu Murtola
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref